Introduction

    It’s often the case that students and parents alike hold the belief that academic success in school is purely determined by the duration or quality of studying, when in reality, there are a plethora of other factors that can affect a student’s performance. Although it’s logical to assume that persistent and productive studying will always lead to better academic performance, it overlooks the fact that students’ personal lives and extracurricular activities also play a key part in their will or ability to study, along with their academic performance.

    At this assertion, the next question that many may ask is “Then, what other factors affect academic performance?,” which I’ll be attempting to answer using a data set provided by UC Irvine’s Machine Learning Repository. During the 2005-2006 school year, in the Alentejo (Ah-len-TAY-zhoo) region of Portugal, researchers Paulo Cortez and Alicia Silva collected data from Gabriela Pereira School and Mousinho de Silveira School using school reports and questionnaires. Using their results, they authored a report in an attempt to develop “more efficient student prediction tools…[improve] the quality of education, and [enhance] school resource management” (Paulo, Silva).

Students at Gabriel Pereira School - AEGP

Students at Gabriel Pereira School - AEGP

Using their compiled data, I will try to answer the following questions.

Research Questions

  1. How do the family circumstances of secondary students at Gabriel Pereira School and Mousinho de Silveira School affect their academic performance in math and Portuguese language classes?
  1. How do extracurricular and social activities of secondary students at Gabriel Pereira School and Mousinho de Silveira School relate to their academic performance?

Data Preparation

    After downloading the data set from the aforementioned UCI Machine Learning Repository and making initial observations, one of the earliest problems I faced was trying to import it properly. After some difficulty in understanding the file type and function, I discovered that it’d be best to read the provided data sets with “delim()” so that it could load properly and be ready for alteration and analysis.

    Additionally, the biggest problems I faced resulted from a small note left by the authors stating that both data sets contained duplicate students. This was an issue since I planned on merging the data sets from the beginning to analyze the average of the variables. Fortunately, the authors provided a line of code that helped me determine what observational units were duplicates and how many of them there were, so I was able to use the “full_join()” function to combine the two data sets according to the given criteria. I then filtered for the needed variables but then experienced some more difficulty since there were 2 columns for 5/8 of the variables I was analyzing since they were not part of the defining criteria. Ultimately, I ended up consolidating the variables that had been affected by the original merge using the “coalesce()” function and then further filtered to organize the final data set.

Variable Descriptions

## # A tibble: 7 × 4
##   variable   type        desc.                                        values    
##   <chr>      <chr>       <chr>                                        <chr>     
## 1 Pstatus    Categorical parent's cohabitation status                 together/…
## 2 famsup     Categorical family educational support                   Y/N       
## 3 famrel     Numeric     quality of family relationships              1-5       
## 4 activities Categorical extra-curricular activities                  Y/N       
## 5 romantic   Categorical in romantic relationship                     Y/N       
## 6 goout      Categorical frequency of going out with friends          1-5       
## 7 G3_avg     Numeric     average of all grades in Portuguese and Math 1-20

Analysis

First, before seeing how these variables interact with academic performance, I believe it’s important to at least view the distribution of the grades themselves.

The distribution of grades is very slightly left skewed as can been seen above. The median final grade is 11 and the IQR (Inter-Quartile Range), which represents the range of the center 50% of the data, is 4. This illustrates that most students are getting between grades of between 9 and 13 out of 20. On an American grade scale, this would be between 45 and 65 out of 100. Notably, a considerable amount of students have failed completely.

Analyses of the Effect of Family Circumstances

For this analysis, we’ll be considering the variables “pstatus”, “famsup”, “famrel”, and their relation to “G3” with specific emphasis on “famrel”. Many studies like this one from the National Library of Medicine have determined and underlined the efficacy of unhealthy family relationships in lowering academic performance due to a variety of psycho-physiological causes.

Based on the calculated data, there were 599 students whose parents live together and 83 students whose parents live separately.

Based on the calculated data, there were 414 students who received educational support from their family and 268 students who didn’t.

The distribution of grades is left skewed, as can been seen above. The median final grade is 4 and the IQR is 1. This shows that most students generally have positive relationships with their family members.

According to the graph, the median grade of students whose parents live apart is 11 and the IQR is 4. For students whose parents live together, the median grade is 11 and the IQR is 4. Unexpectedly, this shows that students’ average grades appear almost identical regardless of whether their parents lived together or apart.

According to the graph, the median grade of students who don’t receive educational support from their family is 11 and the IQR is 5. For students who do receive that support, the median grade is 11 and the IQR is 4. Again, unexpectedly, this shows that family educational support did not have much bearing on students’ grades, but strangely, students without educational support had slightly more positive variability.

According to the result of the graph, the grades of each student depending on how they rated their family relationships can be seen below.

## # A tibble: 5 × 3
##   famrel median   IQR
##    <dbl>  <dbl> <dbl>
## 1      1     10     4
## 2      2     11     5
## 3      3     11     4
## 4      4     11     5
## 5      5     11     5

This shows that grades are generally consistent across all family relationships. However, as can be seen in the graph and chart, there appears to be a slight positive correlation between better family relationships and higher academic performance.

Analyses of the Effect of Extracurricular and Social Activities

Based on the calculated data, there were 332 students who did participate in after-school extracurricular activities and 350 students who didn’t.

Based on the calculated data, there were 252 students who were in romantic relationships and 430 students who weren’t.

The distribution of grades is symmetric, as can been seen above. The mean of the frequency here is 3.174 and the standard deviation, or the average number by which other values deviate from the mean, is 1. This illustrates that typically, most students at GP and MS go out with their friends an average amount.

According to the graph, the median grade of students who didn’t participate in extracurricular activities is 11 and the IQR is 4. For students who did, the median grade is 11 and the IQR is 5. This illustrates that there is a slightly positive correlation between students’ who do participate in extracurricular activities and higher academic performance.

According to the graph, the median grade of students who didn’t claim to be in a romantic relationship is 11 and the IQR is 5. For students who did, the median grade is 11 and the IQR is 4. Although grades appear to be around the same, this shows that students who aren’t in relationships have slightly more positive variability associated with higher grades.

According to the result of the graph, the grades of each student depending on how often they claimed to go out with their friends can be seen below.

## # A tibble: 5 × 3
##   goout median   IQR
##   <dbl>  <dbl> <dbl>
## 1     1     10     4
## 2     2     12     4
## 3     3     11     5
## 4     4     10     5
## 5     5     10     4

This again shows that grades are generally consistent with the frequency of going out with friends. However, it appears that students who do stay at home and presumably study, have slightly higher grades than their peers who don’t. Although, it does show that students who don’t go out at all still receive average grades.

Choice Elements

  • Floating table of contents is included in the top left section of the HTML page. (ex: lines 6-10)
  • Merging data sets and process is included in data preparation section. I combined data sets “math” and “portuguese”. (ex: lines 55-58)
  • Hyperlinks are included in the introduction and conclusion. (exs: line 32 {report}, line 108 {this one})
  • In-line code is included in all graph descriptions. (exs: line 104, line 116, line 124, 132, line 204)
  • Font coloring is included in all graph descriptions along with the in-line coding. (ex: line 106)

Conclusion

After examining the data, it appears that family circumstances, such as whether parents live together, the presence of family educational support, and the quality of family relationships, have a relatively modest effect on students’ academic performance in math and Portuguese. Similarly, students’ engagement in extracurricular and social activities, including participation in after-school activities, romantic relationships, and the frequency of going out with friends, does not seem to substantially influence their final grades, suggesting that these personal and social factors may play a smaller role than commonly assumed.

However, I think it’s important to consider the scope and context of the data in this regard and I suggest that more information be collected across a variety of schools in different regions for the results to be more conclusive. In addition to that, I think it’d be interesting to further research why students who don’t receive educational support from their family have slightly higher grade.

References

  • Cortez, Paulo, and Alice Silva. “Using Data Mining to Predict Secondary School Student Performance.” Repositorium, repositorium.uminho.pt/server/api/core/bitstreams/991a0e2b-249d-466d-afef-937d975ff7fc/content. Accessed 7 Oct. 2025.
  • Cortez, Paulo. “Student Performance.” UCI Machine Learning Repository, 2008, https://doi.org/10.24432/C5TG7T. Accessed 7 Oct. 2025.
  • Deng, Yuwei, et al. “Family and Academic Stress and Their Impact on Students’ Depression Level and Academic Performance.” Frontiers in Psychiatry, U.S. National Library of Medicine, 16 June 2022, pmc.ncbi.nlm.nih.gov/articles/PMC9243415/.
  • “Interescolas de Jogos Matemáticos.” AEGP, 22 Mar. 2019, https://aegp.edu.pt/web/pt-pt/interescolas-de-jogos-matematicos. Accessed 7 Oct. 2025.